Acquiring Bilingual Lexica from Keyword Listings

نویسندگان

  • Filip Gralinski
  • Krzysztof Jassem
  • Roman Kurc
چکیده

In this paper, a new method for acquiring bilingual dictionaries from on-line text corpora is presented. The method merges rule-based techniques for obtaining dictionaries from structuralised data, such as paper dictionaries (in electronic form) or on-line glossaries, with methods used by aligning tools, such as GIZA. The basic idea is to search for anchor words such as abstract or keywords followed by their equivalents in another language. Text fragments that follow anchor words are likely to supply new entries for bilingual lexica.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces

This paper proposes a novel method for automatically acquiring multilingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small reference word li...

متن کامل

Towards producing bilingual lexica from monolingual corpora

Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...

متن کامل

Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover's Distance Regularization

pages 3188–3198, Osaka, Japan, December 11-17 2016. Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization Meng Zhang†‡ Yang Liu†‡ Huanbo Luan† Yiqun Liu† Maosong Sun†‡ †State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua Univers...

متن کامل

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is h...

متن کامل

Extracting Bilingual Lexica from Comparable Corpora Using Self-Organizing Maps

This paper aims to present a novel method of extracting bilingual lexica from comparable corpora using one of the artificial neural network algorithms, self-organizing maps (SOMs). The proposed method is very useful when a seed dictionary for translating source words into target words is insufficient. Our experiments have shown stunning results when contrasted with one of the other approaches. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009